The metaquant package provides functions for estimating means, standard deviations, and visualising distributions using quantile summary data. This package is designed focusing on meta-analyses with continuous outcomes, particularly when only quantile-based information (e.g., medians, quartiles, or extremes) is available in the studies being analyzed. By using flexible quantile-based distributions, metaquant enables researchers to handle quantile summary measures to support the implementation of a comprehensive meta-analysis.
The package deals with three common scenarios of reported quantile data, each with the sample size:
where { \(a\), \(q_1\), \(m\), \(q_3\), \(b\) } denote the sample minimum, first quartile, the median, third quartile and the maximum, respectively.
For the cases with all 5 number summary data is available (S\(_3\)), the package uses the Generalized Lambda Distribution (GLD), a flexible family of distributions capable of approximating many common distributions (e.g., normal, logistic, and log-normal). Specifically, GLD using the FKML parameterisation (Freimer et al., 1988), which is defined by its quantile function is used in the package.
When only 3-point summaries are available, such as in the Scenario 1 and 2 above (S\(_1\) and S\(_2\)), the package uses the quantile-based skew logistic distribution (SLD) (van Staden and King, 2015). Bases on these density based approaches (GLD and SLD), as well as other existing methods, the package provides functions to estimate summary measures such as sample means and standard deviations using the quantile summaries and sample sizes.
In addition to the estimation of sample means and standard deviations, metaquant includes functions for visualising the estimated distributions of the sample data, covering all 3 scenarios above. The visualisation functions allow users to create density plots using only quantile summaries, enabling exploration of group differences, skewness, and heterogeneity across studies.
For more details on the methodology related to metaquant, refer to De Livera et al. (2024).
Alysha De Livera, Luke Prendergast, and Udara Kumaranathunga. A novel density-based approach for estimating unknown means, distribution visualisations, and meta-analyses of quantiles. Submitted for Review, 2024. (Article available on request to authors.)
Marshall Freimer, Georgia Kollia, Govind S. Mudholkar, and C. Thomas Lin. A study of the generalized Tukey lambda family. Communications in Statistics-Theory and Methods, 17(10):3547–3567, 1988.
P. J. van Staden and R. A. R. King. The quantile-based skew logistic distribution. Statistics & Probability Letters, 96:109–116, 2015.
metaquant can be download via CRAN as follows:
Alternatively, the development version can be downloaded using GitHub. To install this version, the user needs to make sure that Rtools has been installed and integrated prior.
The function ‘est.mean’ estimates the sample mean of a study that presents one of the scenarios of quantile summaries. (i.e., either one of 3-point summaries (S\(_1\) or S\(_2\)) or 5-point summaries (S\(_3\)).
The ‘est.mean’ implements a flexible quantile-based distribution methods for estimating sample means proposed by De Livera et al. (2024) as well as some existing methods for estimating sample means as described by Luo et al. (2018) and McGrath et al. (2020).
The estimation methods implemented in the function are the following:
To illustrate the usage of the function, we first generate example 5-point summary data using ‘rlnorm’ function in the ‘stats’ package.
#Generate quantile summary data
set.seed(123)
n <- 100
x <- rlnorm(n, 4, 0.3)
quants <- c(min(x), quantile(x, probs = c(0.25, 0.5, 0.75)), max(x))
quants## 25% 50% 75%
## 27.30990 47.07984 55.61930 67.19152 105.23542
Next, assuming ‘quants’ represents a 5-point summary from a study where we need to estimate the sample mean, the default ‘gld/sld’ method is used for the estimation.
#Estimate sample mean of S3 using 'gld/sld'
estmean_gl <- est.mean(min = quants[1],
q1 = quants[2],
med = quants[3],
q3 = quants[4],
max = quants[5],
n=n)
estmean_gl## $mean
## [1] 57.59711
If one needs to estimate the sample mean using a defined alternative method, simply specify the method as follows.
#Estimate sample mean of S3 using the method 'luo'
estmean_luo <- est.mean(min = quants[1],
q1 = quants[2],
med = quants[3],
q3 = quants[4],
max = quants[5],
n=n,
method = "luo")
estmean_luo## $mean
## [1] 57.28699
Suppose only the minimum, median, and maximum are available as your quantile summaries.
## 50%
## 27.3099 55.6193 105.2354
Then, instead of providing all five quantile inputs to the function, you can use the arguments ‘min’, ‘med’, and ‘max’ along with the sample size ‘n’ to estimate the sample mean. To estimate the sample mean of S\(_1\) using ‘gld/sld’ method,
#Estimate sample mean for S1
estmean_sl_1 <- est.mean(min = quants1[1],
med = quants1[2],
max = quants1[3],
n=n,
method = "gld/sld")
estmean_sl_1## $mean
## [1] 57.28843
Similarly, the ‘method’ argument can be adjusted to use a different estimation method, as described above.
Similarly, if only the first quartile, median, and third quartile are available, use the ‘q\(_1\)’, ‘med’ and ‘q\(_3\)’ arguments of the function.
## 25% 50% 75%
## 47.07984 55.61930 67.19152
#Estimate sample mean for S2
estmean_sl_2 <- est.mean(q1 = quants2[1],
med = quants2[2],
q3 = quants2[3],
method = "gld/sld")
estmean_sl_2## $mean
## [1] 58.85415
Note that, the method ‘gld/sld’ under S\(_2\) does not require the sample size to estimate the sample mean, so the argument ‘n’ can be omitted in this case. However, all other methods require the sample size for the estimation.
For completeness, the package provides the function ‘est.sd’ which estimates the sample standard deviation for a study reporting quantile summary data. This includes 3-point summaries (S\(_1\) or S\(_2\)) and 5-point summaries (S\(_3\)). While the function operates similarly to ‘est.mean’, it incorporates distinct estimation methods specific to standard deviation calculation..
The following methods for estimating the standard deviation are implemented in the function:
The method of Shi et al. (2020) is set as the default estimation option in the function.
For example, to estimate the sample standard deviation using a given 5-point summary, the ‘est.sd’ can be applied by providing the quantiles and the sample size as inputs. We use the same data ‘quants’ used in section 3.1 with the default option ‘shi/wan’ as the estimation method.
#Estimate sample SD of S3 using 'shi/wan' method
estsd_shi <- est.sd(min = quants[1],
q1 = quants[2],
med = quants[3],
q3 = quants[4],
max = quants[5],
n=n)
estsd_shi## $sd
## [1] 15.34892
In addition to the functions ‘est.mean’ and ‘est.sd’, the package also provides two functions ‘est.mean.2g’ and est.sd.2g’ for estimating the sample mean and standard deviation in two-group studies based on quantile summary measures. These functions specifically use the GLD or SLD methods, as the other estimation methods do not support variations for two-group cases.
Particularly, these functions implement the method proposed by De Livera et al. (2024) for two-group cases. The approach uses the Generalized Lambda Distribution (GLD) for 5-number summaries (S\(_3\)), and the Skew Logistic Distribution (SLD) for 3-number summaries (S\(_1\) and S\(_2\)) to estimate sample statistics using quantiles by incorporating shared information across the two groups to improve the accuracy of the estimates.
As a result, these two functions does not require a ‘method’ argument to be specified. However, the functions include additional arguments to input the summary measures for the second group.
For instance, consider the following quantile summaries for two groups. In this case, the ‘rexp’ function from the ‘stats’ package is used to generate example samples with exponential distributions. You may need to load the necessary libraries if they are not already loaded.
#Generate 5-point summary data for two groups
set.seed(123)
n_t <- 100
n_c <- 120
x_t <- rexp(n_t, 5)
x_c <- rexp(n_c, 10)
q_t <- c(min(x_t), quantile(x_t, probs = c(0.25, 0.5, 0.75)), max(x_t))
q_c <- c(min(x_c), quantile(x_c, probs = c(0.25, 0.5, 0.75)), max(x_c))Similarly to the single group case, the ‘est.mean.2g’ and est.sd.2g’ functions can be applied as below.
#Estimate sample mean of S3
estmean_2g <- est.mean.2g(q_t[1],q_t[2],q_t[3],q_t[4],q_t[5],
q_c[1],q_c[2],q_c[3],q_c[4],q_c[5],
n.g1 = n_t,
n.g2 = n_c)
estmean_2g## $mean.g1
## [1] 0.2330208
##
## $mean.g2
## [1] 0.08490334
#Estimate sample SD of S3
estsd_2g <- est.sd.2g(q_t[1],q_t[2],q_t[3],q_t[4],q_t[5],
q_c[1],q_c[2],q_c[3],q_c[4],q_c[5],
n.g1 = n_t,
n.g2 = n_c)
estsd_2g## $sd.g1
## [1] 0.2587828
##
## $sd.g2
## [1] 0.07264839
When only three number summaries (S\(_1\) and S\(_2\)) are available, the corresponding inputs for the two groups can be used directly.
Alysha De Livera, Luke Prendergast, and Udara Kumaranathunga. A novel density-based approach for estimating unknown means, distribution visualisations, and meta-analyses of quantiles. Submitted for Review, 2024. (Article available on request to authors.)
Dehui Luo, Xiang Wan, Jiming Liu, and Tiejun Tong. Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. Statistical methods in medical research, 27(6):1785–1805,2018.
Sean McGrath, XiaoFei Zhao, Russell Steele, Brett D Thombs, Andrea Benedetti, and DEPRESsion Screening Data (DEPRESSD) Collaboration. Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. Statistical methods in medical research, 29(9):2520–2537, 2020.
Jiandong Shi, Dehui Luo, Hong Weng, Xian-Tao Zeng, Lu Lin, Haitao Chu, and Tiejun Tong. Optimally estimating the sample standard deviation from the five-number summary. Research synthesis methods, 11(5):641–654, 2020.
The ‘plotdist’ function estimates and visualizes the density curves of one or two groups (samples) using one of the quantile summary scenarios (i.e., either 3-point summaries (S\(_1\) or S\(_2\)) or 5-point summaries (S\(_3\)). It returns a customizable and interactive plotly object visualizing the estimated density curve(s) of individual studies as well as the pooled densities.
The input data to ‘plotdist’ should be a data frame containing the quantile summary data. For one-group studies, the input data frame can include the following columns:
For two-group studies, the data frame can also contain the following columns for the summary data of the second group: min.g2, q1.g2, med.g2, q3.g2, max.g2 and n.g2.
If only 3-number summaries are available, only the respective columns for the 3-point summary should be included in the data frame.
For example, consider the following dataset which includes three one-group studies, each reporting 5-point summaries along with their respective sample sizes.
# Dataset of 5-point summaries for 1 group
data_s3 <- data.frame(
study.index = c("Study1", "Study2", "Study3"),
min.g1 = c(18, 19, 15),
q1.g1 = c(66, 71, 69),
med.g1 = c(73, 82, 81),
q3.g1 = c(80, 93, 89),
max.g1 = c(110, 115, 100),
n.g1 = c(226, 230, 200)
)
data_s3## study.index min.g1 q1.g1 med.g1 q3.g1 max.g1 n.g1
## 1 Study1 18 66 73 80 110 226
## 2 Study2 19 71 82 93 115 230
## 3 Study3 15 69 81 89 100 200
Then, using the data above, the desnsity curves of the three stduies can be visualised in the same plot using ‘plotdist’ as illustrated below.
# Plot densities
plot_s3 <- plotdist(
data_s3,
xmin = 10,
xmax = 125,
title = "Example Density Plot of S3",
xlab = "x data",
title.size = 11,
lab.size = 10,
color.g1 = "blue",
display.index = FALSE,
display.legend = FALSE
)
plot_s3Note that the function parameters ‘xmin’ and ‘xmax’ must be specified by the user, where ‘xmin’ is a numeric value for the lower limit of the x-axis for density calculation and ‘xmax’ is a numeric value for its upper limit. To ensure the density curve is fully captured, it is recommended to set ‘xmin’ to a value smaller than the smallest minimum value across studies in the dataset, while setting ‘xmax’ to a value larger than the largest maximum value across the studies. If specific values are not provided for the above parameters, the function itself uses the minimum value of the ‘min.’ columns and maximum value of the ‘max.’ columns, for scenario S\(_1\) or S\(_3\). Note that for scenario S\(_2\) , no default calculation is performed and an error occurs since there are no ‘min.’ and ‘max.’ columns.
Suppose you have a dataset of quantile summaries of 3 studies, each reporting 3-point summaries of S\(_1\) along with their sample sizes.
# Dataset of 3-point summaries for 1 group
data_s1 <- data.frame(
study.index = c("Study1", "Study2", "Study3"),
min.g1 = c(18, 19, 15),
med.g1 = c(73, 82, 81),
max.g1 = c(110, 115, 100),
n.g1 = c(226, 230, 200)
)
data_s1## study.index min.g1 med.g1 max.g1 n.g1
## 1 Study1 18 73 110 226
## 2 Study2 19 82 115 230
## 3 Study3 15 81 100 200
Note that, in this case, the data frame consists only of the columns representing the minimum, median and maximum values. Then, the desnsity curves of three studies presenting 3-point summaries can be visualised using ‘plotdist’ as illustrated below.
For instance, assume you have the following dataset of two-group studies, each reporting 5-point summaries for both group 1 and group 2 along with their respective sample sizes.
# Dataset of 5-point summaries for 2 groups
data_2g <- data.frame(
study.index = c("Study1", "Study2", "Study3"),
min.g1 = c(18, 19, 15),
q1.g1 = c(66, 71, 69),
med.g1 = c(73, 82, 81),
q3.g1 = c(80, 93, 89),
max.g1 = c(110, 115, 100),
n.g1 = c(226, 230, 200),
min.g2 = c(15, 15, 13),
q1.g2 = c(57, 59, 55),
med.g2 = c(66, 68, 60),
q3.g2 = c(74, 72, 69),
max.g2 = c(108, 101, 100),
n.g2 = c(201, 223, 198)
)
data_2g## study.index min.g1 q1.g1 med.g1 q3.g1 max.g1 n.g1 min.g2 q1.g2 med.g2 q3.g2
## 1 Study1 18 66 73 80 110 226 15 57 66 74
## 2 Study2 19 71 82 93 115 230 15 59 68 72
## 3 Study3 15 69 81 89 100 200 13 55 60 69
## max.g2 n.g2
## 1 108 201
## 2 101 223
## 3 100 198
Once you input the above data frame into ‘plotdist’ with the appropriate inputs, you can obtain the density curves of two groups, displayed in different colors you provide, within the same plot. Here, we use the default colors defined by the function.
# Plot densities
plot_2g <- plotdist(
data_2g,
xmin = 10,
xmax = 125,
title = "Example Density Plot of Two Groups",
xlab = "x data",
title.size = 11,
label.g1 = "Treatment",
label.g2 = "Control",
display.index = FALSE,
display.legend = TRUE
)
plot_2gTo display the legend labels, you need to provide the names of the groups for the ‘label.g1’ and ‘label.g2’ arguments and set ‘display.legend = TRUE’.
If you need to generate pooled density plots, you can set the ‘pooled.dist’ or ‘pooled.only’ arguments to ‘TRUE’. By default these arguments are set to ‘FALSE’. When ‘pooled.dist = TRUE’, the pooled density curves will be displayed along with the individual density curves and when ‘pooled.only = TRUE’, only the pooled density curves will be plotted, excluding the individual curves.
For example, pooled curves can be added to the previous plot ‘plot_2g’ as follows. Again, the default colors assigned to the two groups are used. You can customize the colors by using the ‘color.g1’, ‘color.g2’, ‘color.g1.pooled’, and ‘color.g2.pooled’ arguments.
For any queries, contact Alysha De Livera a.delivera@latrobe.edu.au or Udara Kumaranathunga u.kumaranathunga@latrobe.edu.au.
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8
## [3] LC_MONETARY=English_Australia.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.utf8
##
## time zone: Australia/Sydney
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] metaquant_0.1.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 jsonlite_1.8.9 dplyr_1.1.4 compiler_4.4.1
## [5] tidyselect_1.2.1 tidyr_1.3.1 jquerylib_0.1.4 scales_1.3.0
## [9] gld_2.6.6 yaml_2.3.10 fastmap_1.2.0 ggplot2_3.5.1
## [13] R6_2.5.1 labeling_0.4.3 generics_0.1.3 sld_1.0.1
## [17] knitr_1.48 htmlwidgets_1.6.4 tibble_3.2.1 estmeansd_1.0.1
## [21] munsell_0.5.1 bslib_0.8.0 pillar_1.9.0 rlang_1.1.4
## [25] utf8_1.2.4 cachem_1.1.0 xfun_0.49 sass_0.4.9
## [29] lazyeval_0.2.2 viridisLite_0.4.2 plotly_4.10.4 cli_3.6.3
## [33] magrittr_2.0.3 crosstalk_1.2.1 class_7.3-22 digest_0.6.37
## [37] grid_4.4.1 rstudioapi_0.16.0 lifecycle_1.0.4 vctrs_0.6.5
## [41] data.table_1.16.2 proxy_0.4-27 evaluate_1.0.0 glue_1.7.0
## [45] fansi_1.0.6 lmom_3.0 e1071_1.7-16 colorspace_2.1-1
## [49] purrr_1.0.2 httr_1.4.7 rmarkdown_2.29 tools_4.4.1
## [53] pkgconfig_2.0.3 htmltools_0.5.8.1
Alysha De Livera, Luke Prendergast, and Udara Kumaranathunga. A novel density-based approach for estimating unknown means, distribution visualisations, and meta-analyses of quantiles. Submitted for Review, 2024. (Article available on request to authors.)
Marshall Freimer, Georgia Kollia, Govind S. Mudholkar, and C. Thomas Lin. A study of the generalized Tukey lambda family. Communications in Statistics-Theory and Methods, 17(10):3547–3567, 1988.
P. J. van Staden and R. A. R. King. The quantile-based skew logistic distribution. Statistics & Probability Letters, 96:109–116, 2015.
Dehui Luo, Xiang Wan, Jiming Liu, and Tiejun Tong. Optimally estimating the sample mean from the sample size, median, mid-range, and/or mid-quartile range. Statistical methods in medical research, 27(6):1785–1805,2018.
Sean McGrath, XiaoFei Zhao, Russell Steele, Brett D Thombs, Andrea Benedetti, and DEPRESsion Screening Data (DEPRESSD) Collaboration. Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis. Statistical methods in medical research, 29(9):2520–2537, 2020.
Jiandong Shi, Dehui Luo, Hong Weng, Xian-Tao Zeng, Lu Lin, Haitao Chu, and Tiejun Tong. Optimally estimating the sample standard deviation from the five-number summary. Research synthesis methods, 11(5):641–654, 2020.